A novel target-driven MLLR adaptation algorithm with multi-layer structure

نویسندگان

  • Jia Lei
  • Xu Bo
چکیده

This paper presents a novel target-driven MLLR adaptation algorithm with multiply layer structure, which is based on the thorough analysis of MLLR using the generation of regression class trees. The new algorithm is constructed on the targetdriven principal. It generates the regression class dynamically, basing on the outcome of the former MLLR transformation. The regression classes is defined in order to have the maximizing increase of the auxiliary function, which is in proportional to the likelihood of the occurrence of the adaptation data. Because of the new algorithm’s special transformation structure, computation load in performing transformation is much reduced. In comparison with the conventional MLLR using the generation of regression class trees, the new algorithm give a further error reduction 10% and has only half computation time consuming. 1. INTRUDUCTION Speaker adaptation techniques try to adapt the initial speaker independent system (SI) to obtain near speaker dependent (SD) performance with only small amounts speaker specific data. Many adaptation techniques is developed for this aim, among which MAP[1], MLLR[2] and MLLR using the generation of regression trees[3,4] have made some progress. MAP estimation uses a prior distribution and can get robust parameter estimates in less data compared to MLE estimation. Although MAP estimation can convergence to the MLE estimation as the adaptation data increase, its adaptation is slow for it only updates distributions for which observations occur in the adaptation data. MLLR estimation is applied in order to capture the general relationship between the speaker independent modal set and the current speaker. A global linear transform matrix is estimated in [2] in order to maximize the likelihood of the occurrence of the adaptation data, then all mean parameters of the system are transformed by this matrix. By using a regression class trees[3,4], MLLR can be applied to a set of transform classes in which some output distributions of the HMM parameter set is transform tied together, based on the assumption that all the output distributions close together in the acoustic space should be tied and transformed together. A large improvement is obtained over conventional MLLR[3] by the using of regression classes. MLLR using a regression class trees have the drawback of making the above assumption. This assumption is not right in some cases that the test speaker’s acoustic property is much different from the speakers in the acoustic modal training set. In this paper, a novel algorithm is presented in order to find more suitable regression classes to improve recognition accuracy. This algorithm defines the regress classes on the augment of the auxiliary function, which is in proportional to the likelihood of the occurrence of the adaptation data. High accuracy can be obtained in this way. In addition to this, the new algorithm based its current MLLR on the former MLLR transformation. The special multi-layer transformation structure makes this algorithm have much less computation complexity. 2. MLLR USING THE GENERATION OF REGRESSION CLASS TREES 2.1 MLLR In the standard MLLR approach[2], the mean vector μ of the Gaussian densities are updated using a ) 1 ( + × n n transformation matrix W calculated from the adaptation data by applying: ijk ijk Wξ μ = ˆ (1) Here T μ ξ )) ( ), 2 ( ), 1 ( , 1 ( ) , 1 ( n ijk ijk ijk ijk ijk μ μ μ = = is the extended mean vector. ijk μ is kth mean vector for each HMM’s transition from state i to j . W is the set of transform matrix. n is the dimension of the feature vector. For the calculation of the transform matrix W , the objective function to be maximized is the likelihood of generating the observed speech frames:

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-layer structure MLLR adaptation algorithm with subspace regression classes and tying

MLLR is a parameter transformation technique for both speaker and environment adaptation. When the amount of adaptation data is scarce, it is necessary to do adaptation with regression classes. In this paper, we present a rapid MLLR adaptation algorithm, which is called Multi-layer structure MLLR adaptation with subspace regression classes and tying (SRCMLR). The method groups the Gaussians on ...

متن کامل

A Novel Target-driven Mllr Adapatation Algorithm with Multi-layer Structure

This paper presents a novel target-driven MLLR adaptation algorithm with multiply layer structure, which is based on the thorough analysis of MLLR using the generation of regression class trees. The new algorithm is constructed on the targetdriven principal. It generates the regression class dynamically, basing on the outcome of the former MLLR transformation. The regression classes is defined ...

متن کامل

Transformation Sharing Strategies for MLLR Speaker Adaptation

Transformation Sharing Strategies for MLLR Speaker Adaptation Arindam Mandal Chair of the Supervisory Committee: Professor Mari Ostendorf Electrical Engineering Maximum Likelihood Linear Regression (MLLR) estimates linear transformations of automatic speech recognition (ASR) parameters and has achieved significant performance improvements in speaker-independent ASR systems by adapting to target...

متن کامل

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001